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Transform Domain Motion Estimation 

By J. A. STULLER and A. N. NETRAVALI 

(Manuscript received March 21, 1979) 

This paper introduces an algorithm for estimating the displace- 
ment of moving objects in a television scene from spatial transform 
coefficients of successive frames. The algorithm works recursively in 
such a way that the displacement estimates are updated from coeffi- 
cient to coefficient. A promising application of this algorithm is in 
motion- compensated inter frame hybrid transform- dpcm image cod- 
ing. We give a statistical analysis of the transform domain displace- 
ment estimation algorithm and prove its convergence under certain 
realistic conditions. An analytical derivation is presented that gives 
sufficient conditions for the rate of convergence of the algorithm to be 
independent of the transform type. This result is supported by a 
number of simulation examples using Hadamard, Haar, and Slant 
transforms. We also describe an extension of the algorithm that 
adoptively updates displacement estimation according to the local 
features of the moving objects. Simulation results demonstrate that 
the adaptive displacement estimation algorithm has good conver- 
gence properties in estimating displacement even for very noisy im- 
ages. 



I. INTRODUCTION 

The coefficient-recursive algorithm described in this paper estimates 
the displacement of objects in a television scene. It is a generalization 
of a pel-recursive displacement estimation algorithm recently intro- 
duced by Netravali and Robbins. 1,2 Coefficient-recursive displacement 
estimation has potential application in hybrid transform-DPCM 3,4 inter- 
frame image coders of the type discussed by Reader, 5 Roese, 6 and 
Jones. 7 The performance of a hybrid transform-DPCM interframe coder 
using coefficient recursive motion compensation is described in a 
companion paper. 8 

Before defining the coefficient-recursive displacement estimation 
algorithm, it is useful to first describe pel-recursive displacement 
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estimation. Let I(x.k, t) denote the intensity of a scene at the fcth 
sample point x* of a scan line, and let 7(x* , t — t) denote the intensity 
at the same spatial location in the previous frame. If the scene consists 
of an object that is undergoing pure translation, then, neglecting 
background, 

I(x k ,t)=I(x k -D,t-r), (1) 

where D is the displacement of the object in one frame interval t. Pel- 
recursive displacement estimation attempts to estimate D by minimiz- 
ing the squared value of the displaced frame difference, 

DFD{x k , t» = I(x k , t) - I(x k - D, t - t), (2) 

recursively with k using a steepest descent algorithm of the form: 

G* +1 = D* - V 2 e VdJDFZ)(x*, D*)] 2 , (3a) 

where V Dj is the two-dimensional gradient operator with respect to T)k . 
Carrying out this operation in (3a) and using (2) yields 

f>k +i - ft* - e DFD(x k , D A ) V/(x* - Da, t - t), (3b) 

where V = V, is the two-dimensional spatial gradient operator with 
respect to horizontal and vertical coordinates X\ and X2 in x = (jti, X2) T : 



V/(XA-b*.<-T) = 



a 

dXi 

d 
dX 2 



/(x, t - r)}~_6 4 . (4) 



Superscript T denotes transpose of a vector or matrix. The pel-domain 
interframe coder of Netravali and Robbins predicts intensity 7(x*, t) 
by the displaced previous frame intensity /(x* — D*, t — r) using 
interpolation for nonintegral D*. If the magnitude of the prediction 
error exceeds a predetermined threshold, the coder transmits a quan- 
tized version of DFD(Xk, t)/,) and address information to the receiver. 
Both receiver and transmitter then update t)k according to (3b) using 
this quantized version. Netravali and Robbins 2 found that a coder 
using this algorithm consistently obtained bit rates that were 30 to 60 
percent lower than those obtained by "frame-difference" prediction, 
which is commonly used in interframe coders. 

In an interframe hybrid transform-DPCM coder, 5 " 7 individual frames 
of video are partitioned into blocks having dimension N r rows by N c 
columns, and a two-dimensional transform is performed on each block 
to produce a set of coefficients. The transform coefficients of the qth 
block of the present frame are predicted by the corresponding coeffi- 
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cients of the qth block of the previous (reference) frame and, if the 
prediction error is sufficiently high, the quantized prediction errors are 
transmitted to the receiver. These quantized errors add as correction 
terms to the coefficients predicted by the decoder, which inverse- 
transforms the result to obtain the decoded image. This process repeats 
with both coder and decoder predicting the transform coefficients of 
the next frame by the coefficients of the decoded frame, as illustrated 
in Fig. 1. In this type of codec, data compression is achieved both by 
the redundancy reduction implicit in the prediction process and by the 
fact that some coefficients can be reproduced with low precision (or 
totally omitted) without visibly degrading the reconstructed image. An 
advantage of interframe hybrid transform- dpcm coding over conven- 
tional (3-dimensional block) interframe transform coding 9 is the fact 
that the hybrid coder requires only a single frame of storage while the 
conventional transform coder requires many. 

In a motion-compensated hybrid transform-DPCM coder of the type 
envisioned (Fig. 2), the nth coefficient of the qth present-frame block 
would be predicted by the nth coefficient of the displaced qth block of 
the previous frame where the displacement is a recursively updated 
estimate of frame-to-frame translation of the moving object. In this 
paper, we introduce and analyze a displacement estimation technique 
that operates recursively on coefficients in a manner analogous to the 
way (3) operates on pels. 

Section II of this paper defines the coefficient-recursive displace- 
ment estimation algorithm for any real linear transform and gives 
illustrative simulation results using a separable 2-row by 8-column (2 
X 8) transform block. A statistical analysis of the algorithm is given in 
Section III. In the analysis of Section III, a single frame is modeled as 
an image drawn at random from a stationary and ergodic ensemble of 
images. This random sample is assumed to be undergoing pure trans- 
lation from frame to frame. An important result of this analysis is 
stated in Assertion 3 of Section 3.2, which says that, under certain 
conditions, the convergence properties of the coefficient-recursive dis- 
placement estimation are independent of the transform used. Section 
III presents simulation results that support this claim using Hadamard, 
Haar, and Slant transforms. Section IV describes an adaptive version 
of the coefficient-recursive algorithm and presents simulation results 
that indicate that this version can be used to some advantage in 
displacement estimation for noisy images. Illustrative simulation re- 
sults are shown here using a 2 X 4 cosine transform block. 

The algorithms discussed in this paper are local in nature and, as 
such, can estimate the individual frame-to-frame displacements of 
several objects that may be present in the television scene. However, 
we emphasize that all results presented here apply to objects undergo- 
ing pure translation; other types of motion are applicable to this study 
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to the extent that these can be approximated by pure translation over 
the spatial dimensions of a transform block. Background uncovered by 
moving objects is also ignored throughout this paper. In spite of the 
approximations involved, simulation results to be described in Ref. 8 
show that the coefficient-recursive displacement algorithm studied 
here can be substantially beneficial when used in an interframe hybrid- 
dpcm codec operating on real-life scenes. 

II. COEFFICIENT-RECURSIVE DISPLACEMENT ESTIMATION 

Let a field of video be partitioned into rectangular blocks of pels, 

each having dimension N r rows and N c columns (N r X N c ). Let x 9 = 

(x\ q , X2q) T denote the coordinates of the upper left-hand pel of the qth 

block, where the blocks in each row of blocks are numbered from left 

to right with q = 0, 1, 2, • • • . We number the N = N r N c pel intensities 

of block q in a column-scanning fashion and denote them by a column 

vector I{x. q , t). Let the N component vector <f>„ be the nth basis vector 

of a nonzero but otherwise arbitrary real linear transform, and denote 

the nth coefficient of the qth block of this transform in the present 

frame by c n (q), where 

c n (q) = I r (x„ t)<f>„ (5) 

and n is numbered from to N - 1. The displaced previous frame 
value of this coefficient is 

c n (q, D) = I T (x, - D, t - T)<t> n , (6) 

where I(x (/ - D, t - r) is the column vector of intensities of the 
displaced gth block of the previous frame and D is the estimated 
displacement of the moving object. Computation of the elements in 
Kx, — D, t — t) generally requires an interpolation among the given 
previous-frame pel intensities. Prediction of c„(q) of (5) by c n {q, D) of 
(6) results in coefficient prediction error 

e„(q, D) = [I(x 9 , t) - I(x 9 - D, t - T)] r «J>„. (7) 

The algorithm defined in this section attempts to decrease the squared- 
prediction errors e(q, D) 2 in a coefficient-recursive manner by steepest 
descent iteration of the form 

D„ +1 ( 9 ) = t>„(q) - | Vf>„we 2 n(q, f> n (q)) 

= D n (q) ~ ee n (q, D n (q))G n (q) (8a) 

for n = 0, 1, • • •, M-2, Af ^ N, and q = 0, 1, 2, • • •, with 

t)o(q) = D M -!(9 " 1) 

- ee M -Aq - 1, Da/-i(9 - D)G M -Aq - 1). (8b) 
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In (8), G„(q) is the coefficient gradient vector 

G n (q) = VF(x 9 - D„( 9 ), t - r)4> n . (9) 

Note that (8) operates upon coefficients through M — 1, where for 
generality we assume M s£ N. Iteration in (8) progresses as follows. 
The initial displacement estimate of the qth block (q > 1), t)o(q) is 
formed by updating the final displacement estimate f)M-i(q — 1) of 
the previous block as in (8b). The next displacement estimate of the 
<7th block t)i(q) is formed from (8a) with n = 0. Iteration progresses in 
the qth block by (8a) with n = 1, 2, • • • , M — 2, resulting finally in 
displacement estimate Dm- i(q) which, when updated in (8b) (with 
q — * q + 1), forms the initial displacement estimate Do(<7 + 1) of block 
q + 1. This iteration procedure continues along all horizontal blocks of 
raster. The procedure is started in the q = block with an arbitrarily 
chosen initial displacement estimate Do(0) followed by iterations of 
(8a) for n — 0,1, • • • , M — 2 and q = 0. In the sequel we assume that 
Do(0) is zero. 

The envisioned motion-compensated interframe hybrid transform- 
dpcm coder transmits a quantized version of coefficient prediction 
error e„(q, t)(q)) to the receiver whenever the magnitude | e n (q, D(q) | 
exceeds a threshold, thereby enabling the decoder to update its dis- 
placement estimate t)„(q) as in (8) as well as correcting its prediction 
c n (q, D „(q)) of coefficient c n {q). Both encoder and decoder use the 
updated displacement estimate in predicting the next coefficient, and 
the process continues. A simplified block diagram of the system that 
omits the thresholding operations is given in Fig. 2. 

In the sequel, it is convenient to rewrite (8) in a form that explicitly 
describes the iteration convention. This can be done by defining a 
single index i, 

i = qM + n; i = 0, 1, 2, ... (10a) 

that equals the total number of iterations of (8) that have occurred in 
iterating from D (0) to t) n {q). Quantities q and n are related to i by 

n = ((i)) (10b) 

q = [ML (10c) 

where we use the notation ((i)) to denote i modulo M and [[*]] to 
denote the integer part of i/M. 

Using (10), we set D, A t) n (q) and rewrite (8) as 

D, +1 = D, - «?«,„([[*']]), D,)G (( »,([[i]]) (11) 

with i = 0, 1, 2, .... Note that the Netravali-Robbins pel-recursive 

1678 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1979 




1 





\ 



























lu <r 5 

Mh< 
LU tO CC 

Q- O 
O 




-J 























• 













-D 

-Q 

T3 

5 



a 

ID 

| 

GO 



fc 



TRANSFORM DOMAIN MOTION ESTIMATION 1679 



displacement estimation algorithm (3b) is a special case of (10) that 
results for a transform "block" having dimension 1 pel by 1 pel and 
single "basis vector" <fo = 1. 

We emphasize that recursions (8) and (11) were derived with the 
objective of decreasing the squared coefficient prediction errors of a 
hybrid transform-DPCM codec. As shown in Section III, coefficient 
prediction error is related to displacement estimation error (approxi- 
mately) by a dot product between the displacement estimation error 
vector and vector G„ (q ) that describes the spatial rates of change of 
the coefficient estimate c„(q, t)„(q)) with respect to small displace- 
ments of the block. Therefore, only the component of displacement 
estimation error in the direction of G n (q) contributes to coefficient 
prediction error, and it is this component that is relevant in evaluating 
the performance of (8) or (11). For this reason, experimental results 
given in this paper refer to the component of displacement estimation 
error measured in the direction of its corresponding coefficient gradient 
G„(<7). 

Experimental illustrations of the behavior of (11) are given in Figs. 
3 through 5 where the moving object was the synthetically generated 
pattern of Fig. 6, displaced 2 pels in the horizontal direction each frame 
interval t. This is a radial cosine function having a radius of 60 pels, 
and peak-to-peak amplitude 220 (out of an intensity range to 255) at 
its center, decreasing to 130 at the circumference. The period P 
decreases with radial distance R starting with a period of 20 pels at 
center to 10 pels at the circumference. The pattern is described 
mathematically by the intensity function 

f{R) = 100 exp(-0.01 R)cos{2ttR/P) + 128; < R < 60, (12a) 
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Fig. 3— Single-line convergence results using 2x8 separable Hadamard transform 

with e = 10~ s . 
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Fig. 5— Single-line convergence results using 2x8 separable Hadamard transform 
with e = 10 \ 



where 



P = (1 - R/60.) 10 + 10. 



(12b) 



This function is displayed on a 256 X 256 element raster in two 
interlaced fields of 128 lines each. In applying (11), the spatial trans- 
forms were taken over a single field with coefficient prediction per- 
formed from the corresponding field separated in time by a frame 
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Fig. 6— Synthetic image used in simulations. This image is described by eq. (11) of 
Section II. 

interval t. Figures 3 through 5 show displacement estimation error in 
the direction of the spatial gradient of the corresponding coefficient 
versus iteration for a sequence of 2 X 8 blocks located 10 field lines 
above the center of the figure. Iteration was initiated with a zero 
displacement estimate approximately 7 pels within the pattern for 
each horizontal sequence. In these examples and all others presented 
in this paper, the two-dimensional transforms concerned were sepa- 
rable transforms of the form 10 



C = V[I]H, 



(13) 



where C is the N r x N c coefficient matrix, [J] is the N r X N c matrix of 
pels, and V and H are unitary matrices having dimensions N r X N r 
and N c X. N c , respectively. Coefficient c n (q) of (5) is the nth column 
scanned coefficient of matrix C, with [/] taken to be the qth pixel block 
of the present frame. In Figs. 3 through 5, V and H were the normalized 
sequency-ordered 2x2 and 8x8 Hadamard matrices of Fig. 7, and 
iteration of (11) progressed through all M = 16 coefficients in a block 
(i.e., M= N). Figure 3 illustrates the behavior of (11) for e = 10 -5 . It 
can be seen by inspection of this figure that displacement estimation 
error tends to decrease roughly in a series of steps of 16 iterations. 
Iterations 1 to 6, 17 to 22, etc., corresponding to the first few (low 
sequency) vectors in {<>,,} tend to affect error significantly, while the 
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iterations corresponding to the higher sequency basis vectors do not. 
This type of behavior is scene-dependent and is investigated in the 
analysis of Section III. Figures 4 and 5 illustrate convergence for 
increased values of e. In general, convergence rate increases gradually 
with increasing e up to a point after which oscillations occur. Conver- 
gence to within 0.5-pel error was achieved for this particular example 
in two or three iterations for e = 10" 4 while the recursion became 
oscillatory (and eventually unstable as in Fig. 5) at values exceeding 
this. All these results are scene-dependent and, to some extent, de- 
pendent upon the row position of the sequence of blocks. Because of 
this, the behavior of (11) will be examined from a statistical viewpoint 
in Section III. 



III. PROPERTIES OF COEFFICIENT-RECURSIVE DISPLACEMENT 
ESTIMATION 

The displacement estimation procedure defined by (8) is a nonlinear 
recursion relation whose dynamic behavior is complicated by the fact 
that the error e n (q, t) n (q)) 2 is generally a multimodal function of 
f)„(q), having global minimum at t)„(q) = D and local minima else- 
where. Convergent solutions to eq. (8) can, therefore, exist at displace- 
ment estimates other than the true displacement D. Subsequent anal- 
ysis will restrict consideration to the case in which the displacement 
estimate is sufficiently close to the true displacement that e 2 (q, t)) can 
be approximated as a quadratic function of D — D. Under this restric- 
tion, the anomalous solutions can be ignored, and (8) reduces to an 
approximately linear stochastic recurrence relation. 

Section 3.1 derives the linearized approximation to (8) and the 
associated quadratic error expression. The dynamic behavior of the 
coefficient-recursive displacement estimator is analyzed in Section 3.2. 
An important result of this section is that, for e sufficiently small, the 
block-to-block convergence rate of mean displacement estimation error 
resulting from (8) is independent of the transform used for any unitary 
transform. 



0.707 0.707 
0.707 - 0.707 



0.353 0.353 0.353 0.353 0.353 0.353 0.353 0.353 

0.353 0.353 0.353 0.353 - 0.353 - 0.353 - 0.353 - 0.353 

0.353 0.353 - 0.353 - 0.353 - 0.353 - 0.353 0.353 0.353 

0.353 0.353 - 0.353 - 0.353 0.353 0.353 - 0.353 - 0.353 

0.353 - 0.353 - 0.353 0.353 0.353 - 0.353 - 0.353 0.353 

0.353 - 0.353 - 0.353 0.353 - 0.353 0.353 0.353 - 0.353 

0.353 - 0.353 0.353 - 0.353 - 0.353 0.353 - 0.353 0.353 



(a) (b) 

Fig. 7— Sequency-ordered Hadamard matrices, (a) 2 x 2. (b) 8 x 8. 
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3.1 Linear analysis 

Assume that the pel intensities are samples of an object that is 
undergoing pure translation D from frame to frame as in (1) so that, 
neglecting background, 

I(x„ t) = l(x q -D,t- t). (14) 

For Euclidean norm || £>„ (q ) — D || sufficiently small, eq. (7) becomes, 
by Taylor's expansion about D — t){q), to a linear approximation 

e n (q, t> n (q)) = [Kx, - D, t - t) - I(x, - f> n (q), t - T)] T <t> n 

a Gftq) ^iq), (15) 

where G„ (q ) is given by (9) and A„ (q ) is the displacement estimation 
error A„(g) = D n (q) — D. Using the approximation in (15), we can 
approximate the squared coefficient estimate error by 

e 2 „(q, t>n(q)) « &Z(q) [G n (q) GUq)]*n(q), (16) 

which is a quadratic function of the horizontal and vertical components 

of *n(q). 

In terms of approximation (15), (8) assumes the form 

A„ +1 ( 9 ) = [U-e G n (q)GZ(q)]* n (q) (17a) 

with 

Aoto) = [U-e Gm-i(9 - DG&-, (q - l)]±u-i(q - 1), (17b) 
where U is the 2x2 identity matrix. Similarly, (11) becomes 

A, +1 = [U - € G ((I „([[i]])G^,)([[i]])]A I , i - 0, 1, ... , (18) 

where A, A A„(q). 

Considering the image as a random process, (18) is a stochastic 
recurrence relation. Equations similar to, but somewhat simpler than, 
(18) have appeared in the problem of adaptive tap gain adjustment of 
automatic channel equalizers. Unfortunately, a complete statistical 
description of the behavior of these simpler equations has not yet been 
obtained. The difficulty in analyzing these equations is that their 
solution depends upon products of matrices that are statistically de- 
pendent. It has been found, however, that useful approximate results 
can be obtained by treating the dependent matrices as if they were 
actually independent. 11 We use this method in Section 3.2 to analyze 
(18). As shown in Section 3.2 and Appendix A, there is some analytical 
justification for this approach because of certain properties of the 
transforms conventionally used in image coding. Further justification 
is given in the asymptotic analysis of Appendix B. 

1684 THE BELL SYSTEM TECHNICAL JOURNAL, SEPTEMBER 1979 



3.2 Statistical analysis 

This section studies the behavior of mean displacement estimation 
error under the assumption that the sequence of gradient vectors 
G((i))([[i]]) entering (18) are statistically independent ("independence 
assumption"). Note that, if the sequence of G«,-))([[*]]) in (18) are 
independent, then A, will be independent of the matrix premultiplying 
it. This follows from the fact that A, is determined by the G«>))([[/]]), 
j < i, which, by assumption, are independent of G((,»([[j]]). The 
ensemble mean of (18) (denoted by the overhead bar) then becomes 



2m = [U - fGuojGfirri&i, i = 0, 1, 2, • • • . (19) 

In writing (19), we have use d (9) and as sumed a stationary ergodic 
image ensemble. The matrix G n (q)G n (q) in t his case wi ll not depend 
upon Xy — t)„(q) and is written simply as G«,-»G(<,-». The matrix 
G((f))G ?(i)) is periodic in i with period M, having valu es spe cified by 
G n Gn, n = 0, 1, ••• M — 1. Alternative expressions for G„G„ are 



G„GJ = VFdfeMtfVlCi) (20a) 

and 



G n G„ — 



<j> n R\2<}>n tynRl&n 



(20b) 



where R\, R> and R\> are auto- and cross-correlation matrices of I/dxi 
and l/dxo, and 



V r = 



d _d_ 

dXi ' 8X2 



Equation (19) can be interpreted in terms of an optimum (Wiener) 
displacement estimator. Consider the mean square nth coefficient 
prediction error resulting from a given displacement error A„ (q ) = S. 
To the linear approximation of (15), this is 



F n (d)=8 T GnGnS, (21) 

which is a quadratic function of the components of 8. A steepest 
descent algorithm for arriving at the minimum of the F n (8) for 
n - 0, 1, • • • , M - 1 is 



8 l+i = [U-eG (il „G (U „]8„ i = 0, 1, 2, ■ ■ . . (22) 

Comparing (19) and (22), we see that, under the assumption of mu- 
tually independent G<(,-i>([[i]]), the mean displacement estimation error 
A, satisfies a recursion (22) that minimizes mean-square prediction 
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error of coefficients n = 0, 1, • • -, M — 1. The convergence of (19) and 
(22) is established below in Assertion 1. 

We emphasize that recursion (19) describes the progression of mean 
displacement estimation error under the assumption of independent 
G«,))([[i]]). The conventional approach to transform image coding is 
to choose basis vectors {<>„} so that the transmitted coefficients (or 
the prediction errors of these coefficients) are as mutually "indepen- 
dent as possible." 12 This is best achieved by choosing the basis vectors 
{<>„} of the transform to be the eigenvectors of the covariance matrix 
(i.e., the Karhunen-Loeve basis) of the block of pels. This basis results 
in transform coefficients that are linearly independent (uncorrected) 
within the transform block, although dependency among coefficients 
from block to block may persist. Other transforms, such as the cosine 
or Hadamard transforms can be viewed as practical approximations to 
the Karhunen-Loeve transform. Assuming that the Karhunen-Loeve- 
basis vectors result in coefficient gradients that are linearly indepen- 
dent as well, this would help justify the application of independence 
theory in describing the behavior of (18). In Appendix A we show that 
this assumption in indeed correct for the stochastic image model most 
widely applied in image processing analyses. Some insight into the 
behavior of (18) for dependent G((,»([[i]]) is given by the analysis in 
Appendix B. 

Assertion 1: Under the independence assumption, mean displace- 
ment estimation error in (19) converges to zero if and only if the 
eigenvalues of ^ are inside the unit circle, where 



*- II [U-eGnGZl (23) 

n=0 



In ou r prod uct notation (23), matrix U — eGoGo is premultiplied by 

U - eGiGl,etc. 

Proof: This can easily be shown by iterating (19) from i = to i = qM 

+ n - 1: 



*qM+n — 



qM+n-l 



II [U-eG wn GTu»] 



A 



= [U - eGn-iGLi] ---[U- eGoG^Ao. (24) 

Therefore, the behavior of & Q M+n as q increases depends upon the 
matrix ^ as ty q , and Assertion 1 follows. QED 

A useful sufficient condition for convergence of (19) is given in the 
following. 

Assertion 2: Under the independence assumption and for any nor- 
malized set of basis vectors, mean displacement estimation error in 
(19) is bounded by 
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k</Af+n 



|Ao|| (25) 



where k, (0 < k < 1) is the maximum eigenvalue of matrices U — 
eG n Gl, n = 0, 1, • • • M - 1, and 

2 

< e < — = — =^=^^ . (26) 

N[(dI/d Xi ) 2 + (dl/dx 2 ) 2 ] 
Proof: Using the Schwartz inequality on (24): 

||£,* + »|| * ||I/-€G JI -,GZ , -,|| • • • || U- €G^?||||*||«||Ao||, (27) 

where || ( • ) || for a symmetric matrix is the magnitude of the maximum 
magnitude eigenvalue. Similarly, 

Af-l 



1*11* n |tf-«CUtt|- (28) 

n-0 



The eigenvalues of U — eG„Gj have the form 1 — e\l? where the Xjf\ 
i = 1, 2, are the (nonnegative) eigenvalues of G„Gj. Let k be the 
largest of the norms in (28) and let X ma x be the largest of the \«' for 
^ n ^ M — 1. Then excluding trivial cases, each matrix norm in (27) 
and (28) will be less than unity for < e < 2/\ max , and at least q of the 
norms in (27) is k. Therefore, 

|| £„*+» 11**1*0 1| (29) 

for < e < 2/Xmax. 

Assume that X ma x corresponds to G„Gj for n = k. Then using (20) 
and the fact that the eigenvalues of a nonnegative-definite matrix are 
bounded by the trace of the matrix gives the chain of inequalities 



7>[G*GJ] 



< N[(dl/dxiy + (dl/dx 2 ) 2 ] . (30) 

Therefore e in the range (26) guarantees < e < 2/A max which, in turn, 
guarantees (29). QED 

Allowing for variations in scene statistics, a conservative choice of 
e would be somewhat less than 2/Ama X . In this event, the following 
assertion applies. 

Assertion 3: If iteration is taken over all coefficients (i.e., M = N) of 
any complete orthonormal basis {tft„} , and if e is small co mpared to 
2/Amax where A max is the maximum eigenvalue of matrices {G„Gj; n 
= 0. •••, N — 1), then the block-to- block convergence of A, is 
independent of the particular basis set used. Furthermore, the con- 
vergence rate is independent of block dimensions N r and N c . 
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Proof: The block-to-block dynamics of A, is determined by the matrix 
^ of (23). For < e <c 2/A max , and for M = N, V can be approximated 
by 



* = U-e I G„G„ r . 

n=0 



From (20a), 



I G„G;T = VF(x) 



tv=t 



X <f>n<?>n 



V r I(x). 



(31) 



(32) 



But since {<£„} is a complete orthonormal set, ££Lo <$>nfyn = U, the A7- 
by-iV identity matrix, and 



* = £/- e Vr(x)V'I(x), 



which is independent of {</>„} . 

The expectation in (33) is given by 



VI T (x)V r I(x) = 



2 ( az/ax,) 2 S (di/d Xi w /dx 2 ) 1 

2 oz/axooz/d^) £OZ/ta 2 ) 2 J* 



(33) 



(34) 



where the summations are over the N pels of a block. By stationarity, 
the expectation of each term in (34) is a constant and (33) becomes 



* = U - eNF 
x(U-eD N , 



where 



r = 



(di/d Xl y 



(dI/dxMdI/dx 2 ) 



(dI/dxM dI/dx 2 ) 
(dl/dx 2 ) 2 



(35) 



(36) 



Therefore q block-to-block iterations of (19) premultiplies A by (U 
— eT) QN , which is a function only of the total number of iterations qN 
and is independent of basis and block dimension. QED 

Experimental evidence of Assertion 3 is shown in Figs. 8a to 8d. 
Figure 8a shows the relevant component of displacement estimation 
error versus iteration number averaged over the interior of the moving 
cosine pattern of Fig. 6 for a pel-recursive (unity) 1 X 1 transform and 
a 1 X 8 Hadamard transform of the type in Fig. 7b using e = 5 X 10 -5 . 
For each scan line entering the average, iteration of (11) was initiated 
with displacement estimate 6 = just inside the circumference of the 
pattern. In spite of the disparity between block size and transform 
type, the block-to-block convergence rate (measured over spans of 
eight iterations) of the Hadamard estimator closely matches that of 
the pel-recursive estimator. Although Assertion 2 concerns average 
displacement error, we found that it often applied as well to individual 
scan lines as shown in Figs. 8b to 8d. These figures show the relevant 
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Fig. 8— Convergence results for various transforms with € = 5 X 10~ n ; (a) Pel-recursive 
and Hadamard 1x8 transform average relevant displacement estimation errors vs 
iteration number, (b) Pel-recursive and Hadamard 1x8 transform relevant displacement 
estimation errors for scan line through middle of moving cosine pattern, (c) Hadamard 
1 X 4 and 1 X 2 transform relevant displacement estimation errors for scan line through 
middle of moving cosine pattern (continued). 
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Fig. 8 — (continued) (d) Haar 1x8 and Slant 1x4 transform relevant displacement 
estimation errors for scan line through middle of moving cosine pattern. 



component of displacement estimation error versus iteration number 
for the scan line running through the middle of the moving cosine 
pattern. Hadamard 1 X 2, 1 X 4, 1 X 8, Slant 1x4, Haar 1x8, and 
pel-recursive estimators are all seen to have similar convergence rates 
with e = 5 X 10~ 5 when measured over the appropriate span of 
iterations. This also applies to the cosine transform (not shown), which 
was found to behave similarly to the Hadamard transform. 

Figures 9a to 9c compare convergence of the 1x8 Hadamard block 
and pel-recursive displacement estimators as e increases. The image 
data in this case was also the middle scan line of the moving cosine 
pattern. It can be seen that the convergence rates of the Hadamard 
and pel-recursive estimators are in rough agreement for increasing e 
up to the point where osculations occur (Fig. 9c). Note that osculations 
occur in the Hadamard estimator before occurring in the pel-recursive 
estimator. This behavior was found to be the case for other transform 
types as well. 

Although the block-to-block convergence rate of transform domain 
displacement estimators is substantially independent of the transform 
type, this is clearly not the case for within block convergence rate, as 
evidenced by Figs. 8 and 9. An explanation of this is given from the 
form of ^ in (23), in which par ticular basis vector <>„ contributes a 
matrix factor of the form [ U — eG n Gj ] . This contribution of <£„ to 
reducing aver age dis placement estimation error depends upon the 
eigenvalues of G„G„ which are a measure of the statistical "match" 
between <f> n and the spatial rates of change of the scene. 

It is possible to vary e with n (e = e n ) to pa rtially compensate for 
differences among the eigenvalues of matrices G„Gj. However, this 
technique can also make the algorithm move sensitive to noise that 
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Fig. 9— Convergence results for pel-recursive and 1x8 Hadamard displacement 
imators. (a) e = 1 x 10 4 . (b) € = 2 x 10" 4 . (c) e = 4 x 10~\ 
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may be present in the image data. Section IV gives another approach 
that appears to have particularly good noise rejection properties. 

IV. ADAPTATION 

This section shows how the coefficient displacement estimation 
algorithm of Sections II and III can be improved by adaptively updat- 
ing the displacement estimate according to the local features of the 
image. This is a technique that is not possible for the pel-recursive 
estimation. Adaptation in displacement estimation is motivated by the 
recognition that single frames of video are neither noiseless nor best 
described as stationary processes. Simulation results using noise-cor- 
rupted versions of the radial cosine object of Fig. 6 demonstrate that 
an adaptive algorithm of the type described here can have better 
convergence properties than either pel-recursive or nonadaptive coef- 
ficient recursive displacement estimation. 

4.1 Preliminaries 

The potential advantage of adaptation in (8) can be seen by consid- 
ering the simple example of the moving edge scene of Fig. 10. This 
edge has constant slope g = 3.8 intensity increments per pel-to-pel 
distance over a width of 50 horizontal pel intervals, and velocity 2.7 




Fig. 10 — Synthetic moving edge pattern. 
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pels horizontal per frame. Within the width of the edge, G n (q) of (9) 
is given by 



Gn(q) = 



ggg •" g 
000 ... 



*». (37) 



For a sequency-ordered basis set {</>„} , G n (q) will be zero for all ac- 
basis vectors with the result that the corresponding displacement 
estimate updates will be determined solely by noise that may be 
present on the edge. Updating displacement estimate by iterating over 
these basis vectors can only increase estimation error. The dc-basis 
vector (1/VN), however, results in 

G n (q) - ( g ^J , (38) 

which provides signal-dependent terms in (18) proportional to VN. 
This suggests that, for this example, it may be better to iterate 
repeatedly (say M times) over the dc-basis vector than to sequence 
through the M-basis vectors. We have not been able to analyze 
rigorously the performance of such dc-basis iteration on the moving 
edge in the presence of additive noise. In Appendix C we assume that 
the additive noise is white with power a 2 w , and invoke certain assump- 
tions regarding the independence of noise terms entering the recur- 
rence relation. The result is the following approximate expressions for 
the horizontal component of displacement estimation error mean t)a(i) 
and steady-state variance o\. 

tjaU) ~ A(0)(1 - PY; i = 0, 1, 2, ... (39a) 

•l-W^jjjfij. + ^l, (39b) 

where /? = eNg 2 and a = a'i/Ng 2 . 

Not too surprisingly, expressions (39) indicate that, for constant rate 
of convergence (i.e., constant /?), the steady-state displacement esti- 
mate error variance decreases inversely with block size iV*. This points 
out a possible advantage of transform domain displacement estimation 
compared to pel-recursive displacement estimation where the dc-basis 
block size is constrained to have dimension N = 1. Figures 11 through 
13 show experimental and theoretical behavior of the horizontal com- 
ponent of displacement estimation error for pel-recursive and 2x8 
dc-basis iteration on the moving edge. In obtaining the sample mean 
7ja(i) and sample variance a\, averages were taken over 128 field lines. 

Experimental and theoretical results are seen to compare favorably 
in the pel-recursive estimator case (Fig. 11) at a signal-to-noise ratio 
(snr) = 45 dB. In Fig. 12, which describes displacement errors in the 
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Fig. 11 — Horizontal component of displacement estimation error for noisy moving 
edge (snr = 45 dB) using pel-recursive displacement estimation, e = 0.02. 
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Fig. 12 — Horizontal component of displacement estimation error for noisy moving 
edge (snr = 35 dB) using pel-recursive displacement estimation, e = 0.02. 
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pel-recursive estimator at 35 dB, there is approximate agreement 
between theory and experimental data. This is also the case for the 2 
X 8 dc-basis results of Fig. 13. From the experimental data in Figs. 12 
and 13, it can be seen that the "dc-iteration" is more effective than 
pel-recursive estimation in combating the effects of noise in the dis- 
placement estimation of a moving edge. 

The theoretical and experimental behavior of steady-state displace- 
ment error variance ai(oo) versus /? is plotted in Fig. 14 for pel-recursive 
displacement estimate for the moving edge pattern with snr = 35 dB. 
The range 1 < ft < 2 represents oscillatory convergence with ai(oo) 
increasing rapidly as ft approaches 2. Note the trade-off between 
convergence rate and accuracy evidenced by Fig. 14 and eq. (39a). 

4.2 Adaptation algorithm 

The adaptive displacement estimation algorithm proposed here 
updates the displacement estimate (8) using that basis vector ij> m o 
whose projection onto the computed coefficient gradient of the refer- 
ence frame has maximum amplitude. At each iteration step (n, q), the 
magnitude of the coefficient gradient vector || G n (q) || is computed from 



SNR=35dB 




-1.0 



iili) (SINGLE SAMPLE)- 



10 15 

ITERATIONS WITHIN EDGE 



20 



25 



Fig. 13 — Horizontal component of displacement estimation error for noisy moving 
edge (snr = 35 dB) using 2 X 8 dc basis iteration. € = 0.00125. The factor € was adjusted 
in this experiment to result in an identical average convergence rate as that of Figs. 11 
and 12. 
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the noisy previous frame data for each vector in the given basis set 
{<J>„,; m — 0, 1, • • • M — 1}. The particular basis <J> m0 maximizing this 
quantity is then used for the displacement update. 

Figure 15 compares the performance of adaptive and nonadaptive 
displacement estimation using a separable 2x4 cosine transform and 
e = 10 -r ' for the test image of Fig. 16. This image consists of the moving 
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Fig. 14 — Estimation error standard deviation vs rate parameter /?: pel-recursive at 
snr = 45. 
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Fig. 15 — Displacement estimation errors for noisy synthetic image of Fig. 16. 
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object of Fig. 6 corrupted by additive white Gaussian noise at 20-dB 
snr. Also shown are results for pel-recursive displacement estimation 
at e = 10~ 8 and a convergence-rate-optimized e = 4 x 10 . The pel- 
recursive scanning pattern was chosen according to Fig. 17 to match 
the rate of progression of the pel-recursive algorithm along individual 
scan lines to that of the 2x4 transform algorithm. (For example, after 




Fig. 16— Noisy test image, snr = 20 dB. 
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Fig. 17 — Pel-recursive scanning pattern applicable to test results of Fig. 15. 
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40 iterations, both pel-recursive and coefficient-recursive displacement 
estimators will have traversed 20 columns of a given scan line.) All 
results of Fig. 15 are averages of relevant displacement error over the 
interior of the moving object. It can be seen from Fig. 15 that the 
adaptive 2x4 cosine displacement estimator has clearly superior 
average error convergence rate than either the pel-recursive or non- 
adaptive cosine displacement estimators. No choice of e was found 
that could improve the convergence rate of the pel-recursive estimator 
beyond that shown for e = 4 X 10~ 5 . We also computed the experimen- 
tal standard deviations of relevant displacement estimation error. 
Nonadaptive cosine and pel-recursive estimators had experimental 
error standard deviations of 0.50 and 0.51, respectively, at e = 10~ 5 . 
The error standard deviation of the adaptive coefficient-recursive 
displacement estimator was 0.54, while that for the rate-optimized pel- 
recursive estimator was an inferior 0.67. 

The above results demonstrate a potential advantage of adaptive 
coefficient-recursive displacement estimation over both pel-recursive 
and nonadaptive coefficient-recursive displacement estimation. It re- 
mains to be established, however, whether the adaptive scheme of this 
section will improve the performance of a motion- compensated hybrid 
transform-DPCM coder. 



V. SUMMARY 

This paper has introduced a coefficient-recursive displacement es- 
timator having potential application in motion-compensated inter- 
frame hybrid transform-DPCM image coders. The convergence of the 
mean displacement estimate to the true displacement was established 
in Assertions 1 and 2 using assumptions that are supported by the 
analyses of Appendices A and B. Assertion C described conditions 
under which the rate of convergence of mean displacement estimation 
error is independent of the transform block size and type. An extension 
of the coefficient recursive algorithm was given in Section IV and 
shown by simulation to have improved convergence properties in the 
displacement estimation of noisy objects. 



APPENDIX A 

This appendix verifies a statement in Section III concerning the 
orthogonality of coefficient vectors G n (q) /i = 0, 1, ••• M — 1 for a 
separable Markov image model. This model is described as follows. 
Let I m , n denote the intensity of the pel located in the mth row and nth 
column of the raster. Then for the Markov image model treated here: 
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/„,„/,, = a 2 pl m -"pr y| , (40) 

with < | p r | < 1, < | p c | < 1. In (40), a 2 is the intensity variance, and 
p r and p c are the correlation coefficients of adjacent pels between rows 
and columns, respectively. If the pel intensities in a N r row by N c 
column block are indexed in column scan fashion and denoted by a 
column vector I, then it can be shown that 



I.I T =a 2 M r xM c , (41) 

where X denotes the Kronecker product, 13 and M r and M c are, respec- 
tively, N r X N r and N c X N c Toeplitz matrices having i/'th entries 

[M r ]„ = pl'-" 
and 

[M c .]„ = pl'- y| . (42) 

Covariance models of this form have been widely applied in image 
processing studies. 14 " 18 Expressions for the eigenvalues and eigenvec- 
tors of Mr (or Mo) are given in Ref. 18. 

We now apply this model to a study of the covariance properties of 
G n (q), n = 0, 1, • ■ • M - 1. From (9) we have 

<ft n R\2<t>m <J>/ii?2</>m 



G n (q)Gl(q) = 



(43) 



As described in Section III, matrices Ri, R 2 , and R i2 are auto- and 
cross-correlation matrices of the spatial derivatives of /. For the 
discrete image model specified by (40), we compute these spatial 
derivatives as the corresponding spatial differences. For example, 
derivatives in the row direction of the raster are given by 
(I m ,n - I m ,n-i)(Iij - hj-i)- Expanding this product and using (40) gives 

(Im.n ~ Un-lHhj ~ hj-,) = a» P M WcP^ + M*], (44) 

where a< = 2 - p7 l - p c , f3 c = p7 l - pc, and 8 nJ is the Kronecker delta 
function. By comparing (40) and (41) with (44), we have 

Ri = a 2 M r X [ctcMc + r3cU], (45) 

where U is an identity matrix. 

Similarly, with a r = 2 - p7 l - pr and fi r = p7 l - p r : 

R 2 = a\a r M r + firU] X M c (46) 

and 

Rn = a 2 [a r M r + /? r U] X [a c M c + f3 c U\ (47) 
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As shown in Ref. 13, the eigenvectors of a Kronecker product A X 
B have the form 

xiy J 



x l N y J 

where x'k, k = 1, 2, • • • , N, denotes the components of the eigenvector 
x' of A and y y denotes an eigenvector of B. Since any vector is an 
eigenvector of an identity matrix, it follows that the eigenvectors of 
R\, R 2 , and Rn above are in fact identical to the eigenvectors of the 
image co variance matrix (41). The normalized eigenvectors form a 
complete orthonormal set and are considered to be the optimum bases 
for transform coding image blocks modeled by (41). Selecting {<>„} to 
be this set of eigenvectors, it follows that all terms in the matrix of 
(43) will be identically zero for n¥= m, which establishes the statistical 
orthogonality of the G n (q). 

APPENDIX B 

This appendix shows that mean displacement estimation error for 
dependent G (( ,»([[i]]) is approximately given by recursion (19) for 
small e. 

Iterating (18) yields 

A. + i = ( II [U- eG« y ))([[y]])G5 y ,)([L/]])]jAo. (48) 

The matrix product premultiplying A,, in (48) is a function of €. Taylor's 
expansion of this function about e = yields (for fixed i): 



A« + i = ( U- e n G ((y)) ([[>]])G^ „([[>]]) )A + 0(e 2 ) 

7-0 



so that 



i*i = (U- e £ G tun GTun Uo + 0(e 2 ). 



y=0 



(49) 



(50) 



On the other hand, repeating the steps of (48) and (49) on (19) gives 
(for the independent G«, »([[i]]) case): 



iw = U " € I GujnGTvn Uo + 0'(e 2 ). 



y=0 



(51) 



Comparing (50) and (51) we have the result that mean estimation 
errors for the dependent and independent cases are equal to an 
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approximation 0(e 2 ) - 0'(e 2 ), which can be neglected for sufficiently 
small e. 

APPENDIX C 

This appendix briefly sketches the steps leading to (39). 
We consider the successive frames of video to consist of a moving 
edge contaminated by independent noise: 

/(x, t) = g{x x -D) + w(x, t) 

I(x, t-r)=gx x + w(x, t - t), (52) 

where the noise w{- , •) is white with respect to both pel-to-pel and 
frame-to-frame dimensions. An analysis similar to that leading to (18) 
then yields 

A,- +1 = A, - e[G H (q) + V[W T (x [m] - D,., - t)*„]] 

.[G„ r ( 9 )A, + W T (xj mi , t)ki - W T (x l[n] - D„ t - t)«M- (53) 

We consider iterating (53) repeatedly, using dc-basis vector <>o = 
1/y/N and G n (q) given by (38). Let Mi) denote the displacement error 
in the horizontal direction at iteration i; Z\(i) denote the difference of 
the two noise terms in the final term of (53); and Z 2 (i) denote the 
horizontal component of the noise gradient term in (53). Then the 
horizontal component of (53) becomes 

Mi + 1) = A(i) - eNg 2 [Mi) + Zi(i)/</Ng][l + Z 2 (i)/y/Ng]. (54) 
Defining /3 = eNg 2 , y = 1/yfNg and rearranging terms yield 

Mi + 1) = [1 - 0(1 + yZ 2 (i))]Mi) - /iyd + yZ 2 (i))Z l (i). (55) 

Note that each noise term in (55) is multiplied by a factor y which, 
for given /?, decreases as I/Vn. Neglecting dependencies between 
{Z 2 (i)}, and (Zi(i')}, this equation is linear with respect to input Z\ 
and output A. The solution has the form 

Mi) = A(0) J] [1 " >»<1 + yZiU))] + I h(i, k)Zdk), (56) 

7=1 *-0 

where h{i, k) is the response of (55) for A(0) = and input Z\{i) = S ik . 
By assuming that {Z\(i)} and {Z 2 (i)} are mutually independent white 
sequences, the mean and variance of Mi) can now be derived by a 
tedious but conventional analysis, resulting in (39). 
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